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^\ A new procedure is proposed for the dimensional reduction of time series. Similarly 

to principal components, the procedure seeks a low-dimensional manifold that minimizes 
information loss. Unlike principal components, however, the new procedure involves 
' dynamical considerations, through the proposal of a predictive dynamical model in the 

I— I reduced manifold. Hence the minimization of the uncertainty is not only over the choice 

of a reduced manifold, as in principal components, but also over the parameters of the 

^0 dynamical model. Further generalizations are provided to non-autonomous and non- 

Markovian scenarios, which are then applied to historical sea-surface temperature data. 
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^ 1 Introduction 

On 

Complex systems typically involve a large number of degrees of freedom. Thus to eluci- 

date the fundamental mechanisms underlying one such system's behavior, one may consider 
T-H its projection onto smaller-dimensional manifolds, selected so as to capture as much of the 

dynamics as possible. A tool frequently used for this purpose is principal components [T], 
• • whereby a linear subspace of prescribed dimensionality of the phase-space of observations is 

sought, so as to maximize the amount of the variability that is preserved when the data are 
^ projected onto it. 

^ Given a dataset Zj, j £ [1, . . . , N], where each observation Zj consists of n real numbers, 

its first m (m < n) principal components are given by Xj = Q'x{zj — z), where z is the 
mean value of z, and Qx is an n x m matrix with orthonormal columns, chosen so that 
X^jLi ~ ^) ~ QxXjW"^ is as small as possible. From a statistical perspective, among all 
m-dimensional subspaces, x is the one whose knowledge minimizes the uncertainty of z. The 
matrix Qx consists of the first m columns of U in the singular value decomposition 

z = usv' 
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where the elements Z\ of the matrix Z G jinxN contain the ith. component of the jth 
observation minus its mean value over all observations, U G Jin-xn ^nd V G ji^xN 
orthogonal matrices, and S G R^^^ is the diagonal matrix of singular values of Z, the 
eigenvalues of the empirical covariance matrix C = ZZ' sorted in decreasing order. 

In the probabilistic scenario underlying this procedure, the Zj^s are independent samples of 
a Gaussian distribution AA(/i, S), z is an estimate for its mean //, and the principal components 
estimate the principal axes of the covariance matrix S, sorted in decreasing order by the 
fraction of the total variance that they explain. Yet principal components are often sought 
for data that do not quite fit this scenario. Of particular concern to us here is the situation 
where the z^'s form a time series, representing snapshots of the vector z at equidistant times 
tj. In this context, the dimensional reduction by principal components, oriented toward 
data compression, lacks any concept of dynamics: the various snapshots zj are treated as 
independent observations, which renders immaterial even the order in which they are sorted. 
If there is an underlying dynamics, this is neither unveiled nor exploited by the analysis. 

An example is provided by the Empirical Orthogonal Functions (EOFs) [2| -the name 
given to principal components in climate studies-, which take a time series of atmospheric or 
oceanic data, subtract its time average or "climatology" , and find those modes that explain 
the largest share of its variability. These modes may then be assigned suggestive names such 
as "El Nino" or "The north-Atlantic oscillation" and given a dynamical interpretation. Yet 
no dynamics ever entered into their calculation: just the static variability of the data, treated 
as a series of independent, unsorted observations. 

In this paper, we develop an alternative methodology, highly reminiscent of the principal- 
component framework, but with a dynamical core. We seek, as in principal components, 
a hierarchy of manifolds, that we name "principal dynamical components". Attached to 
these manifolds is a model of predictive dynamics. The cost function to minimize has, as in 
principal components, the variability in the unrepresented variables, but also the fraction of 
the variability in the preserved variables that is not explained by the dynamics. Thus the 
dynamical components are characterized not by capturing most of the system's variability, but 
by explaining dynamically its largest possible share. Hence this methodology can be thought 
of as a blend of autoregression analysis [6|, which is used as a reduced dynamical model, and 
principal components, though the criterium for selecting a reduced manifold differs from the 
latter's. 

We first present the new methodology, as a natural extension of principal components, 
in a linear, autonomous framework, with a dynamic manifold given by x = Q'^z, where Qx 
is a fixed n x m orthogonal matrix, and the dynamics by Xj+i = Axj, where A is another 
fixed, mxm matrix. The definition of principal dynamical components results in a minimiza- 
tion problem over both Qx and A. Section [2] presents this problem and provides an efficient 
methodology to solve it. Yet many real problems are not autonomous: climate dynamics, for 
instance, is season-dependent. In Section [3] we extend the methodology to non-autonomous 
situations and, more generally, to accommodate for the presence of exogenous variables and 
external controls, that appear in many engineering applications. Here Qx and A depend 
on time and on those external variables. Section |4] extends the procedure further to handle 
non-Markovian processes, where the dynamics involves more than the immediate past. We il- 
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lustrate the procedure throughout with synthetic data and, in Section [5| we concern ourselves 
with a real application to time series of sea-surface temperature over the ocean. Section [6] 
gives a probabilistic interpretation of the principal dynamical component procedure, which 
provides a conceptual extension to general nonlinear, non-Gaussian settings. The develop- 
ment of effective algorithms for the numerical implementation of this broad generalization 
will be described elsewhere. 

2 The linear, autonomous framework 

The probabilistic set-up for principal component analysis consists of independent observations 
drawn from a Gaussian distribution. The natural extension to time series has a time series 
Zj , j £ [1, . . . , N] drawn from the linear Markovian dynamics 

Here the matrix models autocorrelation, and J\f represents a Gaussian process with mean 
A^Zj and covariance matrix S^. Neither A^ nor are known to us; instead, we seek an 
m-dimensional manifold x = Q'^z and reduced dynamics 

= Axj + ^j, 

where is the prediction error, such that the predictive uncertainty or cost 

N~l N-1 

c = ^ \\zj+i - QxXj+i\\'^ = ^ - QxAQ'^ZjW 

j=i j=i 

is minimal. This is the conceptual basis of what we shall denote linear autonomous principal 
dynamical component analysis. 

It is convenient to introduce some further notation: y for the orthogonal complement of 
X, so that 

Z = [QxQy] ( y ) ' 

where Q = [QxQy] is an orthogonal matrix, and x for the conditional expectation of x: 

Since the dynamics of y is not explained by the model, we have yj+i = 0. 
2.1 Two-dimensional case 

The simplest scenario, appropriate for a first view of the proposed algorithm, has the obser- 
vations Zj in a two-dimensional space, n = 2, and seeks a reduced manifold x of dimension 
m = 1. We introduce the following notation: 
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where, mimicking an application to climate dynamics, A stands for Atlantic and P for Pacific 
spatially-averaged sea-surface temperatures, 

X = Acos{9^) + Psm{9^) , 

y = -As\n{e^) + Pcos(6'*) , 
where the angle 6* defines the direction of the dynamic component x in (A, P) space, and 



where the stretching factor a describes the deterministic component of the reduced dynamics. 
The cost function adopts the form 



c{e,a) 



N-l 

E 

N-l 

E 



2 N-l 
2 N-l 



Xj+i - Xj+i 

Vj+i - Vj+i 



^ 7 = 1 



By contrast, the corresponding cost function for regular principal components in this 2- 
dimensional scenario is 

N 

the amount of variability in the unrepresented variable y. 

The minimization of c can be solved iteratively. If at the beginning of a step we have 
coordinates {x,y), then 

dc 

— = -2 ^ {xj+i - axj) Xj. 



i=i 



Equating |^ to zero yields the standard regression formula 

_ L;=i ■'•.y'-.;-l 

If now we update x and y through a further rotation 

X x cos{9) + y sin(^) , 
y i xsin(^) -|- ycos{9) , 

we have 

dc 

^ = 2a ^ [ i<^Xjyj - ixj+iyj + Xjyj+i)) cos(26') + 



(xj+iXj - yj+iyj + I (y| - x])^ sm{29) 
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Rather than seeking a closed expression for 9 that would make this derivative vanish -notice 
that 6 is implicitly included in the definition of the x and y's-, it is preferable to descend the 
gradient 

dc ^""^ 

QQ " ^ ^^^^^^ ~ ^^^+^y^ + XjVj+i)] 
or, more efficiently, to involve also the second derivative 



89^ 



N-l 



and compute the 9 that minimizes the quadratic local approximation to c: 



dc 




ae 










6»=0 



A little extra care is required when applying the quadratic approximation far from the optimal 



V. if 



ae^ 



< 0, then we must do descent instead: 



9 = -ei 



dc 



9=0 



(2.11 



where > is a chosen learning rate. Also, if 9q is too big, we must limit our step size: 

\9\ = max(|0g|, e), 

where e is the maximum allowable step in ^. It is sensible to relate the values of the two e's 
through 

e 

e« = 



^2 _^ I ac 



which, when applied to (2.1), yields descent steps of size bounded by e, and much smaller 
near the optimal 9. 



To illustrate the procedure just described, we created data from the dynamical model 



Xj+l 



axj + r^rij , 



y 



ryVj: 



for j = 1, . . . , N — 1 (the initial values xi and yi are picked at random), where = 1000, the 

X, 



r]^'^ are independent samples from a normal distribution, and we adopted the values a = 0.6 
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for the dynamic^ and rx = 0.3 and Vy = 0.6 for the amphtudes of the noise in x and y. 



Then we rotated the data through 

Aj = Xj cos{9^,) — Hj sin(6'=i,), 
Pj = Xj sm{9^,) + yj cos{9^), 

with = |, and provided the Aj and Pj as data for the principal dynamical component 
routine. The results are displayed in Figure [T] The first plot shows the "observations" in 
the plane {A, P). These are treated as independent samples in a regular principal component 
analysis; we keep instead track of the sequential order of the observations, represented by the 
dotted lines in the plot. For this data, the first regular principal component, drawn in black, 
is in fact orthogonal to the principal dynamical component, drawn in green. The reason is 
that the total variability has a larger y-component, due to the bigger amplitude of the noise in 
y, while all the variability that is explainable dynamically is in x. This is an extreme example 
where regular principal components yield a leading mode that is absolutely irrelevant from a 
dynamical viewpoint. The other three plots in the figure display the evolution of the estimates 
for a and and the cost function c, as functions of the step- number. The dotted lines, drawn 
for reference, have the exact values of a and 0^, in the data, as well as the unexplainable part 



of the cost, c* 



N 



+ {rym 



0.45. Notice the fast convergence 



to the exact solution, that in this example took 14 steps. 



2.2 The multidimensional case 

For dimensions n bigger than two, we write 

and 



mm c 



Zj+i - Q 







E 



Xj + l 

Vj+i 



Axj 



The minimization problem that defines Q = [QxQy] and A is 

N-l / / \ 2 N-1 

E 

Notice that Q and A are not univocally defined: any pair of orthogonal bases for the optimal 
subspaces represented by x and y will give rise to different Q's and ^'s representing the same 
dynamics. The algorithm proposed below walks nicely around this degeneracy, avoiding 
unnecessary re-parameterizations of the two subspaces. 

The most straightforward methodology decouples the descent steps for A and Q. For A, 
we have 

dc 
dA 



N-l 



-2 {xj+i — Axj) x'j 



(2.2) 



^The value of \a\ needs to be smaller than one for the time series not to blow up. 
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5 10 15 20 ■ 5 10 15 20 

step step 

Figure 1: A two-dimensional example of the basic procedm'e. The first plot displays the 
data points, with dotted lines joining successive observations, and the directions for the first 
regular principal component -in black- and the principal dynamical component -in green-, 
which in this case are orthogonal to each other. The other three plots show the evolution 
of the estimates for the parameters a and for the dynamics and reduced manifold, and of 
and the cost function c -normalized by {N — 1)- as functions of the step-number, with their 
exact values as dotted lines. 



Instead of descending the gradient we can, as in the two-dimensional case, directly solve 



dc 
dA 



0, yielding 







A = XiXo' {XoXo') 


-1 




where 










Xo = [xi, . 


. ,xn-i] and Xi 


= [X2,-- 




For the descent steps in Q, we first note that any orthogonal matrix c 


product of elementary rotations of the form 








/ 1 








... 0\ 







. . . cos(0) . . . 


sin(6') 


. . . 


RkiiO) = 















... -sin(6l) ... 


cos{e) 


. . . 




u 








... ij 
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which act in the plane of the two coordinates k and I, rotating them an angle 9. Then we can, 
in each descent step, pick at random the two indices k and I, and perform a rotation following 
the derivative of c with respect to 9 at 6 = 0. Prom the observation above about degeneracy, 
however, we note that picking both k and / from cither the dynamical coordinates x or their 
orthogonal complement y alone, serves no purpose other than re-parametrization. Then we 
always pick k at random in [1, . . . , m], and adopt I = m + h, with h picked at random in 
[1, . . . , n — m]. In order to consider arbitrary directions in these two manifolds though, we 
first perform a random orthogonal transformation to each: 

X Qlx, y Qly, 

where Qx,y ^-re random orthogonal matrices. 
For each elementary rotation, we have 



dc 
89 



N-l 



m 



p=i 



(2.3) 



and 



89^ 



N-l 



X' 



k 



m 



fc\2 
p) 



(2.4) 



As before, 9 can be computed so as to minimize the quadratic local approximation to c: 



dc 




80 








96*2 


61=0 



with the same caveats on big steps as in the one-dimensional case. 

After performing the optimization, one can, if desired, resolve the degeneracy in the 
description of the dynamical manifold by choosing a natural basis for a;, such as the one 
made out of the principal components of A. For non-normal A's, there are two such bases: 
the eigenvectors of A! A and those of AA! . Both are significant and sorted by sensitivity to 
perturbations: the former gives the directions where initial perturbations yield the highest 
effect; the latter, the directions where these effects manifest themselves after a time-step. 



To create a simple synthetic example for the multidimensional case, we chose n = 5 and 
m = 2, and created data from the dynamical model 

Xj^\ = Axj -\- TxTJj , 
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for j = 1, . . . , N — 1, where = 1000, the ryj'^'s are independent samples from a normal 
distribution -two and three dimensional vectors respectively- and we adopted arbitrarily the 
values 

0.4569 0.3237\ 
-1.0374 1.0378y 

for the dynamics, and Vx = 0.3 and = 0.6 for the amplitudes of the noise in x and y. Then 
we rotated the data through an arbitrary orthogonal matrix. 



A 



[Q.Qy 



with 



-0.7044 -0.3823 -0.3407 -0.1985 -0.4497 ^ ' 
0.5754 -0.1555 -0.1798 0.2477 -0.7423 



and generated the data displayed in the first panel of Figure [2] Running our algorithm 
on these data yields estimates A* and Q* for A and Qx that, as remarked before, are not 
univocally defined. Indeed, the algorithm found 

^ _ . 0.6505 0.2401\ 
-1.1591 0.8685y 



and 

Ql 



-0.8143 -0.3258 -0.2896 -0.2545 -0.2865 
0.4089 -0.2108 -0.2570 0.1873 -0.8290 



quite different in appearance from their exact values above. 

To verify that Qx and Q* span the same plane and that A and A* represent the same 
transformation in the corresponding coordinates, we project the two columns of Q* onto the 
space spanned by those of Qx, through the projection P{Q* y = BQ'^, with B = (Qx)' Qx, 
and define the relative errors 

{QD'-BQ'J ^ \\A-B~^A*B\\ 



= iTTTli ' 



WQxW \\A*\\ 

which vanish only if the two pairs of matrices represent exactly the same reduced manifold 
and dynamics. 

The results are displayed in Figure [2] The first plot shows the first three components of 
the data points zj. The second plot displays the evolution of the normalized cost function 
c as a function of the step-number, with the dotted line displaying the exact value of the 

unexplainable part of the cost, c* = ^f=i^ (j'x\\r]j\\j + ('^2/N|ll) ~ 2r^ + = 1.26. 
The third and fourth plots display the evolution of the errors eg and ba defined above. Notice 
again the fast convergence of the algorithm to the exact solution. 
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step 




10 20 30 40 50 10 20 30 40 50 

step step 



Figure 2: A multidimensional, autonomous example, with n = 5 and m = 2. The first plot 
displays the first three coordinates of the data points, with dotted lines joining successive 
observations, the second plot shows the evolution of the cost function, with its exact value as 
a dotted line, and the third and fourth plots display the evolution of the errors eg and e^. 

2.3 Non-zero means 

We have worked so far under the assumption that all means have been removed from the 
problem: the plane x goes through the origin, and the transformation given by the matrix 
A is linear, not affine. If the observations z have a well-defined mean (that is, if there is not 
a trend over time that makes the local mean of z evolve), these assumptions are fine: it is 
enough to remove from z its mean -the "climatology" of atmosphere-ocean science- ad initio, 
and add it back at the end. However, for the non-autonomous scenario to be described below, 
it will be necessary to consider nontrivial means. In order to have our methodology prepared 
for this more general case, we consider the means in our present autonomous situation too, 
even though they have no practical consequence. Then we write 

and 

Xj+i = Axj + b . 
It is convenient to partition z into its x and y components, 

X Qx^ 5 y Qy^- 



10 



The addition of the mean x, however, is unnecessary, for its effects can be absorbed into the 
drift b. We have the gradients 



dc 
db 



N-l 

-2^{x,+i-{Ax,+b)) (2.5) 



and 



8r 

- = -2 ^ (y.+i - y) , (2.6) 

that can be used either for descent or for the direct calculation of the optimal b and y. 

3 Non- autonomous problems 

We have considered up to now only autonomous problems, where the manifold x and the 
corresponding dynamical model are assumed to be time-independent. Yet there are many 
examples of practical importance where this assumption does not hold. Consider, for in- 
stance, climate-related data, such as monthly averages of sea-surface temperatures at various 
locations, recorded over many years. One should expect much of the dynamics to depend 
on seasonal changes in insolation. We should, accordingly, have a time-dependent dynamical 
model, with a period of one year. Similarly, in long series of economic or financial data, we 
should expect a change in the dynamics as populations or affluence levels change, new mar- 
kets arise, new tools are developed. The corresponding dynamical model should not longer be 
constant, nor periodic as in the seasonal case, but rather evolve slowly, with scale separation 
between the time-scale of the dynamics and that of the evolution of the model itself (without 
the hypothesis of scale separation, little can be inferred statistically from the data, since the 
dynamical model can be adjusted instantly to account for each individual observation). 

To incorporate this into our framework, it is enough to add a qualifying sub-index "t" (or 
more precisely "j", since our time-series are discrete) to the various functions involved: Q, 
A, b and y , plus the requirements of periodicity or scale separation. For instance, Qt should 
satisfy either Qt+T = Qt in the periodic case, or ||(5t+i — Qt|| <C 1 for slowly varying trends. 



In this section, we discuss how to modify the methodology of Section 2.2 so as to make it 
applicable to the non-autonomous linear case. 

The idea is simple: in the notation of the previous section, we are seeking a time-dependent 
orthogonal transformation Qt and mean yt, and a time-dependent dynamical model param- 
eterized by At and bt- To this end, in each descent step, we pick at random a time to and 
propose, in order to update Q, a time-dependent rotation angle 6{t) in the k-l plane, centered 
at t = to. Similarly, we propose time-dependent variations for y, A and b: 

e = aF{t), y = y + vF{t), A = A + B F{t) , b = b + dF{t), 

where F{t) is a given scalar function, centered at tQ, and satisfying the corresponding restric- 
tions: periodicity, slow variation, etc., and the parameters a, a scalar, v and d, vectors, and 
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B, a matrix, are computed by descent of the cost function as before. Then equations (2.3) 



and (2.4) generahze into 

N-l 

= -2 

a=0 



dc 
da 



-2E 



ly^+i {AkXj + bk) + w^) ^ Al (x^+i - A.pXj - bp 



p=i 



and 



N-l 



a=0 



2 ^ {w^^'fx]^^ {A^x, + bk) - 2w^+'w^y^+,Aly^+ 



f2 [iw'fx';At(xP^, -A 



.^(x^+i - ApXj - bp) + {w^y^^A^p-^^ 



p=i 



and equations (2.2), (2.5) and (2.6) into 

N-l 



dc 
dB ~ 

dc 
dd 



and 



dc 
dv 



-2 ^ {xj+i - {Axj + b)) x'j , 
i=i 

N-l 

-2 {xj+i - {Axj + b)) 

N-l 

-2 {yj+i - y) , 



where the weights are given by 

= F{tj) . 

As before, equating the derivatives with respect to B, d and v to zero provides simple closed 
forms for B, d and v, while a can be found through the minimization of a local quadratic 
approximation to c. 



3.1 Exogenous variables 

The time t of the non-autonomous scenario discussed above is just one example of an exoge- 
nous variable: one whose state is known independently at all times, and that may affect the 
dynamics of the z's. Other examples are state variables of a bigger system of which the z's 
are only a small part; and external controls. 

One can collectively denote these exogenous variables s, and apply a straightforward 
generalization of the procedure above, where F is now a function of s rather than the single 
variable t. 
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3.2 Trial functions 

We have not yet considered the issue of how to pick the functions F{s) and corresponding 
weights = F{s^) (here we use s to denote either time or other exogenous variables). In 
this section, we describe a few choices that we have found practical. First of all, for the 
autonomous case, we have the trivial 

F = 1. 

This should still be used in the more general case, to capture the s-independent components 
of Q and A, but must be alternated with other functions F{s) with non-trivial s-dependence. 



3.2.1 One-dimensional functions 

When s represents time, the domain of F{s) must be either the real line -for the trend- or a 
parametrization of the unit circle -for periodic factors such as the seasons. A sensible choice 
for the trend is 

displayed on Figure [Sj depending on the choice of a center sq, picked at random at each 
step, and a mollification parameter, the length-scale L. As L — )• 0, F{s) becomes piecewise 
constant, with a discontinuity at s = sq. For larger values of L, the transition between the 
two constant states is smoothed over an interval of order L. The subtraction of the mean 
F over the observations is intended to decouple the effect of these steps from the ones using 
F = 1, a function concerned only with the mean. At the initial stages of the algorithm, L 
should be large, providing a global perspective; then it should decrease gradually, to tune the 
finer, more local details. 




Figure 3: Plot of the function F{S) 
of L. 



over the interval [—10, 10] for different values 



We can be more specific: calling Lq the largest length scale in s, we need Lq/L steps to 
cover it with transitions of length L. Then the amount dt of algorithmic time spent using a 
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length L should satisfy 
leading to the expression 



dL 
~dt 



L = Ln 



oc L, 



L f\ ''tot 

Tn 



where k is the step number, ktot the total number of steps, and Lf the smallest length scale 
to be used, not to over-resolve the dynamics. 

In the periodic case, we can make an entirely analogous proposal: 



Fis) 



sin(5') 



V4sin^(5/2) + L2' 
where T is the period; see Figure |4j 



S 



27r (s - So) 



(3.1) 




Figure 4: Plot of the function F{S) 
values of L. 



sin(g) 



^4sin2(5/2)+L2 



over the interval [— tt, tt] for different 



Sometimes s can adopt only a discrete set of values: the months of the year, an on-off 
control, etc. In that case, it may be useful to consider signature functions F that are one on 
each of these values at a time, and zero on the others: 



-fiO) — ^mod{j,T),i, 



(3.2) 



where T is the integer period. 

An alternative to the F(s)'s above, which have local derivatives but global effects, are the 
more localized bumps given by 

L3 



Fis) 



and 



(^2 + L2)3/2 
L3 



(4sin2(5/2) + L2)3/2 



14 



displayed in Figures [5] and [6} We can also alternate between the two, or among more proposals 
satisfying different needs. 




-10 -5 5 10 

S 



Figure 5: Plot of the function F{S) = (^g2_^_ 1^2^3/2 over the interval [—10, 10] for different values 
of L. 




values of L. 



Still another natural alternative in the periodic case is to use Fourier components 

27rs 

F,^ = cos(A;5), F^ = smikS), S=—. 

There is no need for a center sq here, since the use of both sines and cosines renders the F 
spatially homogeneous. Each step one must use either F^ or F^ with probability 1/2 each 
(discounting the steps with F = 1), and an integer value for the wave number k. The latter 
should be sampled from a distribution that decays rapidly with k, so as to result into smooth 
composite functions. 
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An advantage of the use of Fourier modes, particularly when only a finite number K of 
modes is allowed, is that one can store the accumulated amplitude added to each mode at the 
various steps, and thus end up with explicit expressions for the non- autonomous dynamical 
matrix A[s) and shift h{s) as finite Fourier series. To obtain a similar bonus for the non- 
periodic case (i.e., for representations of the trend), one would need to replace the functions 
above by others that do not involve a variable length-scale L and random point sq. A simple 
choice is that of monomials 

Fk{s) = s\ 

up to a power K. Then the dynamics is represented by a matrix A and a vector b that depend 
explicitly on s through polynomials of degree K. 

When the problem has more that one kind of variable -some periodic and some trendy, 
for instance-, we can alternate the various types of function F[s) among steps. 



3.2.2 Multidimensional choices 

When s lives in a multidimensional space, we can still use the one-dimensional proposals 
involving sq and L above, but picking the direction of space in which they apply each step 
at random. Yet this is not a very effective procedure when the dimensionality of s is large. 
An alternative is to use radial functions centered at sq, such as 

F{s) = e-^\ rJ\^. 



3.3 A non-autonomous example 

For clarity, we illustrate the non-autonomous procedure through a simple example where 
n = 2, m = 1. We created data from the dynamical model 

Xj+i = ttjXj + bj + r^rjj, 

for j = 1, . . . , N — 1, where = 1000, the r?J'^'s are independent samples from a normal 
distribution, rx = 0.3 and r„ = 0.6, and we adopted the values a,- = I cos^ (^5^) for the 



dynamics, bj = ^ sin ^^^^ for the drift, and yj = | cos ^^^^ for the non-zero mean of y, 
where tj = j and T = 12, mimicking the twelve months of the year that we will find again 
in our application to the sea-surface temperature field in Section [5] Then we introduce, as 
before, "Atlantic" and "Pacific" temperatures 

Aj = Xj cos{9j) — yj sin{6j), 
Pj = Xjsm{ej) +yjCOs{ej), 

with 9j = |sin^^^^, and provide the Aj and Pj as data for the principal dynamical 
component routine. 
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For this example, we have adopted the trial function F from (3.1). The results are 



displayed in Figure [Tj The first plot shows the "observations" in the plane {A,P), with 
the first regular principal component drawn in black, and the 12 first principal dynamical 
components, one for each month, drawn in green. The other plots in the figure display the 
evolution of the normalized cost function c, and the estimated results for a{t), b{t), y{t) and 
9{t) at convergence (we only show the first two periods). The dotted lines, drawn for reference, 
have the exact values of a{t), b{t), y{t) and 9{t) in the data, as well as the unexplainable 



part of the cost, c* = ^ 



N-l 



rxVi 



+ 



V 



0.45. Again, the algorithm 



detects essentially the exact solution to the problem; the number of required steps, about 60, 
is bigger than before, because various different trial functions F{t) are involved, requiring at 
least one step for each. 




^ 
to 

-Q 



10 15 20 
t 




20 40 60 80 100 
step 



Figure 7: A low dimensional (n = 2,m = 1), non-autonomous problem. The first plot 
displays the data points, with dotted lines joining successive observations, the first principal 
component in black, and the twelve monthly first principal dynamic components in green. 
The other plots show the final estimates for a{t), b{t), y{t) and 6{t), for two periods of twelve 
snap-shots each, and the evolution of the cost function, with the exact answers in dotted 
lines. 



4 Higher order processes 

We have considered so far dynamical models without memory, where the current state of the 
system determines its future evolution through the matrix A. Yet many real processes are not 
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well-described by such models. For instance, if the observations consist only of positions xj 
in a system with non-negligible inertia, one would expect a better prediction by using, in lieu 
of the unavailable velocity field, a second order model, x^+i = D{xj,Xj-i). Studying systems 
like this involves no significant change in our procedure: either we extend the phase-space 
from the line of xj's to the plane of pairs {xj,Xj-i) or, equivalently, consider matrices A that 
are rectangular, with twice as many columns as rows. Entirely similar considerations apply 
to higher order processes with longer memory. 

We describe here the non-Mar kovian, non-autonomous case of order r, since the au- 
tonomous scenario is just a special case of the non-autonomous one, and the case with more 
general exogenous variables s is entirely similar. Our reduced dynamical model now adopts 
the form 

r 

Xj+i = Z) = 6 -I- ^ AiXj-i+i, 

i=l 

where the drift b and the matrices Ai,i = l,...,r, as well as the orthogonal matrix Qx 
defining the x's, may in general be time-dependent. Each algorithmic step, we update these 
matrices through 

Ai = Ai + BiF{t), b = b + dF{t), y = y + vF{t), e = aF{t), 

where F{t) is a given trial function as described above. 
The cost function adopts the form 

Af-l 

c = E iiy.-+if + ii^.-+i-^f ' 

j=r 

since the first xi,.. . ,Xr are not specified by the dynamics. Then 

dc 

^ = -2 E u;^ {xj+i - D) , 



for every h = 1,. . . 



and 



dc 
dd 



N-l 



dc 

dv 



3=r 



-2 E ivj+i - y) , 
where = F{tj). 

It is possible to get explicit expressions for Bi,. . . ,Br by equating all to zero. We 
introduce the m x m matrices 

N-l N-l 
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where Xq''^ = (^Xq''^^ . In terms of these, we get a block system of hnear equations: 



BiX^'^ + B2X^'' + ■■■ + BrX^'' =Xi, 



-2,1 



r,l _vl 



BiXq''^ + -82^0'^ + • • • + B^Xq^ 



.2,2 



r,2 



BiXY + B2Xl^' + ■■■ + BrX'/ =Xl, 



or 



[Bi, B2, ■ 


•• Br] 


^0 

^0 


^1,2 

^2,2 


• ^'1 

v2,r 

• ^0 








Yr,2 
^0 


• ^0 -1 



[xl, Xf, ■ ■ ■ XI] , 



which determines the matrices Bi, . . . ,Br. 

For the angle a we proceed as in the previous sections, though a quadratic approximation 
to c, using 



dc 
da 



N-l 



a=0 



and 



3=r 



N-l 



2E "^^■+'2/^i(^). + E(^«)pH+i-(^)p) 



do? 



a=0 



p=l 



where 



Vj-i+l ' (-^0 



Again we choose, for the sake of clarity, to illustrate the procedure in its simplest possible 
setting, which is autonomous, with n = 2, m = 1, and r = 3, the order of the Non-Markovian 
process. We created data from the dynamical model 

Xj+i = aiXj + a2Xj_i + a3Xj-2 + rxTjj, 

for j = 3, . . . , N — 1, where = 1000, the TyJ'^'s are independent samples from a normal 
distribution, and we adopted the values ai = 0.4979,02 = —0.2846,03 = 0.1569 for the 
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dynamics and r^. = 0.3 and ry = 0.6 for the amplitudes of the noise in x and y. Then, as 
before, we define 

Aj = Xj cos{9^) — Uj sin(6'=i,), 
Pj = Xj sm{9^,) + Uj cos{9^), 

with 0* = |, and provide the Aj and Pj as data for the principal dynamical component 
routine. The results are displayed in Figure [8] Again the procedure converges to the exact 
answer, this time for all elements of the multi-step dynamics. As in the first example, the 
first regular principal component is orthogonal to the principal dynamical component, thus 
capturing none of the system's dynamics. 




Figure 8: A multi-step process of order 3. The first plot displays the data points, with dotted 
lines joining successive observations, their regular first principal component in black and 
their first principal dynamical component in green. The other plots show the evolution of the 
estimates for ai, 02, as and 9, as well as of the cost function, with their exact values and the 

exact unexplainable part of the cost, c=k = "^^=1 (j'xVj^ + {^yV^^ ~ ''x + ''y = 0.45, 
displayed in dotted lines. 



5 A real application: the global sea-surface temperature field 

To see the workings of the new procedure on real data, we have chosen a topic of present 
concern: the estimation of climatic variations and trends. For this, we use a database of 
monthly averaged extended reconstructed global sea surface temperatures based on COADS 
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data (see [4j) from January 1854 to October 2009, and ask whether we can extract from these 
a reduced low dimensional dynamical model. A few before-hand considerations are in order: 



• Climate dynamics, a real pressing issue, is treated just as an illustration in this method- 
ological paper. A far more in-depth treatment of how much principal dynamics com- 
ponents can help increase climate predictability and elucidate its causal relations will 
be pursued elsewhere. 

• The ocean is not an isolated player in climate dynamics: it interacts with the atmosphere 
and the continents, and is also affected by external conditions, such as interannual 
variations in solar radiation and human- related release of CO2 into the atmosphere. 
The latter are examples of slowly varying external trends that fit naturally into our 
non-autonomous setting -the seasonal variations giving its periodic component. As for 
the land and atmosphere, their dynamics is typically faster than that of the oceans, and 
can be conceptually divided into two components: a part that is slaved to the state of 
the ocean's surface temperature -and hence can in principle be included in its dynamical 
model-, and one that can be treated as external noise. Including explicitly land and 
atmospheric observations involves at least two further challenges, that will be pursued 
elsewhere: handling data with disparate units -such as atmospheric pressure, ice extent 
and ocean temperature-, and allowing for multiple time-scale dynamical models. 

• Even within the ocean, the surface temperature does not evolve alone: it is carried by 
currents, and it interacts through mixing with lower layers of the ocean. As mentioned 
in Section [4j one way to account for unobserved variables is to make the model non- 
Markovian: discrete time derivatives of the sea-surface temperature provide indirect 
evidence on the state of those hidden variables. 

We have adopted as our dataset the sea-surface temperature monthly means between 
January 1854 to October 2009 of the 50 points displayed on the map in Figure |9} covering 
much of the world oceans in a roughly homogeneous manner. 

In order to apply our methodology to the data, we need to select a class of trial functions 



from Subsection 3.2, the dimension m for the reduced manifold x, and the order r of the 



non-Mar kovian process. The trial functions for the periodic component that we have used 



for these runs are the monthly discrete (^-functions from (3.2), with T = 12. This takes to a 
new depth the idea behind the use of a "monthly climatology" in climate studies: not only 
the climatological mean is computed independently for each month, but also the dynamical 
model and manifold may change significantly from month to month. In the runs reported 
here, we have not modeled any inter-annual trend. 

The following physical considerations suggest picking r = 3 for the order of the Markov 
process. A simplified conceptual model for the upper mixed layer of the ocean is that of a 
rotating shallow layer of water, forced by the atmosphere from above and the deep ocean from 
below. In such model, the active dynamical variables are the two horizontal components of the 
velocity and the layer's thickness. The surface temperature can be thought of as an emergent 
of the evolution of these three variables and the external forcing. Conservation of mass 
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Figure 9: The 50 points on the ocean used for the procedure. 



and horizontal momentum, the core dynamics of the layer, are three differential equations, 
each involving one time derivative. Hence reducing the system to a single variable -the 
temperature, the only one available in the data- yields a third order differential equation: two 
time derivatives relate to the evolution of gravity waves, the third to the potential vorticity. 
In our discrete setting, this corresponds to a Markov process of order three. 



Figure 10 illustrates a line of reasoning for choosing the values of m and r. The figure 
on the left shows, for a fixed m = 4, the evolution of the final error when we move the 
order of the non Markovian process from r = 1 to r = The dotted line shows the 
error X^^^s Sf, where the S"s are the singular values of the real dataset, with the monthly 
climatology subtracted. We find for r = 3 the steepest drop of the final error, consistent 
with our reasoning above. Therefore we pick r = 3 for the order of our non Markovian 
process. In the figure on the right, we observe, for this fixed r = 3, the evolution of the 
final error when m = 1, . . . , 6. The isolated points correspond to the sum of squared singular 
values, YljLm+i • observe that for m = 4 this error matches almost exactly the 
one from the dynamical components. This can be interpreted in the following way: for 



^This final error is calculated for a number of steps such that the difference between the final error and 
the error 1000 steps before is less than 0.01. Therefore the number of steps used for each value of r may be 
different. 



22 



smaller values of m, accounting for the dynamics allows us to reduce the information loss 
even beyond the theoretical maximum -for autonomous settings- provided by the singular 
value decomposition. Beyond m = 4, on the other hand, the biggest share in the further 
reduction of information loss is probably due to the increased bare dimensionality of the 
model, more than to a further refinement of the dynamics. Hence we pick m = 4 for the 
dimension of our reduced dynamical manifold. 




Figure 10: Predictive uncertainty as a function of the dimension m of the reduced dynamical 
manifold and the order r of the process. The figure on the left shows the final error for a fixed 
m = 4 and r = 1, . . . , 6. The dotted line is the sum X^^^s Sf of the squared singular values 
of the data, with the monthly climatology removed. The figure on the left shows the final 
error for a fixed r = 3 and m = 1, . . . , 6, with the isolated points displaying the corresponding 
uncertainty in the standard principal component procedure, X^i^m+i ^i- 



Next we display various results for the chosen parameters, m = 4 and r = 3. Although 
the dataset used is defined over a wide range of time, we display our results on the time 
window from January 1991 to January 1999, which includes three El Nino years, represented 
by vertical lines; one of them, in 1998, the strongest ever recorded. Figure 11 shows the 
evolution of the four components of the manifold x in solid lines and, in dotted lines, the 
same components predicted from the prior three months. 

One question one may ask is whether the reduced dynamical manifold x is dominated by 
a small set of locations on the ocean. This would be manifested in having the columns of 
Qx dominated by a few significant rows. Yet the columns of Qx do not have a meaning per 
se: X is a four dimensional manifold, but each component lacks individual meaning. To 
fix a reference frame in the manifold x, we resort to the matrix Ai: its four left principal 



components U in Ai = USV provide a natural set of coordinates in x-space. Figure 12 



displays the first four columns of Qx{t)U{t),t = 1, ... , 12. We observe between four and six 
dominant peaks; the four clearest ones corresponding to the points 19, 24, 37 and 41 in Figure 
M This suggests that a reduced dynamical model for the ocean could be built from four to 
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Figure 11: Real and predicted dynamical components (the manifold x). The vertical lines 
mark El Nino years. 



six selected locations. Notice that these four points are on the Pacific ocean, in locations that 
one would naturally associate with the strongest El Nino signals. 

Figure 13 shows the observed ocean surface temperature for these four points, comparing 
them with the ones predicted by the algorithm, in dotted lines. We see that the approximation 
is quite sharp, particularly near El Nino years, where changes of temperatures are most 
significant. Even though we have chosen to plot only these four temperatures, all of the 50 
points used are well-predicted by the procedure. 

Finally, we monitor the evolution of a measure of the global anomalies associated with 
El Nino. To this end, we compute a discrete analogue of the running 3-month mean SST 
anomaly in the El Nino regions [5]. In particular, we average the temperatures on the points 
16, 17, 24, 28, 29, 32, 33, 37, 41 and 42 on the map in Figure [9j for a time window from 
February 1964 to October 2009. These 10 points are not all strictly included in what are 
known as El Nino regions (there are four of them, 1+2, 3, 4 and 3.4), but they are the 
closest on our discrete map to the union of all of them. We observe in Figure 14 that the 
warm (positive) peaks coincide with El Nino years. The cold (negative) peaks correspond to 
La Niiia years. In dotted line we have plotted the predicted values of these SST anomalies 
generated by the principal dynamical component procedure. 
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Points in the ocean 

Figure 12: Dependence of the four components of the reduced manifold x on the individual 
locations on the ocean for the twelve months of the year. A natural coordinate system in x is 
the one provided by the principal components of the first dynamical matrix Ai. Notice that 
four to six points on the ocean dominate the dynamics. 

6 Probabilistic perspective and extension to nonlinear dy- 
namics 

Throughout this article, we have defined and developed the principal dynamical component 
procedure in terms of the minimization of a specific cost function: the sum of squares of 
the prediction errors. In this section, we assign a meaning to this cost in terms of the log- 
likelihood function of a probabilistic model. Framing the principal dynamical component 
procedure in a probabilistic setting has two main advantages: to permit a more thorough 
interpretation, and to extend its applicability beyond the linear models developed in this 
article. We sketch such generalization in this section; its algorithmic implementation, under 
current development, will be presented elsewhere. 

Generally, a probabilistic model for a time series Zj G i?" involves the transition proba- 
bility density 

T{zj+i\zj) . 

(This corresponds to the Markovian, autonomous scenario, the only one that we address in 
this section. The extension to non-autonomous and non-Markovian cases, involving a tran- 
sition probability density of the form T{zjj^i\zj, Zj^i, . . . , Zj^r, t, s), is straightforward). The 
principal dynamical component proposal considers a dimensional reduction of such transition 
probability density, using the following elements: 
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Figure 13: Observed and predicted ocean surface temperature for four selected points. The 
vertical lines mark El Niiio years. 



• A coordinate system z = z{x,y), x € i?™, y S with corresponding projection 
operators Px and Py: 

X = Px{z{x,y)), y = Py{z{x,y)). 

• A reduced dynamical model given by a transition probability density in K^: 

d{xj+i\xj) . 

• A probabilistic embedding 

e{y\x). 

The transition probability density for z is then given by 

T{zj+i\zj) = J (zj+i) e{yj+i\xj+i) d{xj+i\xj), 

where x = Px{z), y = Py{z), and J{z) is the Jacobian determinant of the coordinate map 
z {x,y). 

A natural measure of the goodness of the model is the log-likelihood function 

N-l 

L= J^log [T(z,-+i|z,-)]- 
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Figure 14: Observed and predicted 3-month mean SST anomalies from February 1964 to 
October 2009, quantifying El Nifio and La Nina intensities. Only El Nino years are marked 
with vertical lines; La Niiia years correspond to strong negative anomalies. 



In particular, in the setting of Section [2| we have the projections 

P^{z) = Q'^z, Py{z) = Q'yZ, 



(6.1) 



where Q = [QxQy] is orthogonal, so J{z) = 1. The embedding and reduced dynamics are 
given by the isotropic Gaussians 



and 



e{y\x) =M{0,a^lN-m) 



d{xj+i\xj) = J\f{Axj,a'^Im), 



(6.2) 



(6.3) 



where Ik stands for the k x k identity matrix. Consequently, the log-likelihood function is 
given by 



N-l 



1 

2^ 



log(27r) + nlog(cr) + ^ {\\xj+i - Axj\\'' + ny^+n 



Thus maximizing the log-likelihood L over Q and A is equivalent to minimizing the cost 
function 



N 



1 w-i 
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that we have used throughout the paper; the corresponding optimal value of a is given by 



This interpretation immediately suggests the following generalization, which remains 
within the realm of Gaussian distributions and linear maps: keep the orthogonal projections 



in (6.1), but replace the embedding ( |6.2[ ) and dynamical model (6.3) by the more general 

eiy\x) = M{0,^y), 
d{xj+i\xj) = J\f{Axj,Y,x), 

where 'Ex and T,y are general covariance matrices. The resulting log- likelihood function is 

N-l 

^ = E "2 N((2vrr|S,||S,|) + - - Ax,)) + (y,+i, ^y.+i)] . 

i=i 

This formulation has the advantage of providing a natural ranking of the coordinates x and 
y, through the principal components of the corresponding covariance matrices. 

More generally, one can propose different, typically nonlinear, families of distributions, 
projections and dynamical models, and maximize the corresponding log-likelihood function. 
The proposed distributions can be given parametrically, in which case the maximization of 
the log-likelihood is over their parameters, or non-parametrically, for instance as an extension 
of the methodology proposed in [3]. Thus the principal dynamical component methodology 
extends naturally to very general scenarios, with nonlinear reduced dynamical manifolds, 
stochastic, nonlinear dynamical models, and non-Gaussian embeddings. This extension, how- 
ever, goes beyond the scope of this paper, and will be pursued elsewhere. 



7 Conclusions 

A new methodology has been developed for the dimensional reduction of time series. The pro- 
cedure seeks a low dimensional manifold x and a dynamical model Xj+i = D{xj, xj-i, ■ ■ ■ ,t) 
that minimize the predictive uncertainty of the series. The procedure has been successfully 
tested on synthetic data, and illustrated with a real application to time series of sea-surface 
temperature over the ocean. Finally, a probabilistic interpretation of the principal dynamical 
component procedure was proposed, providing a conceptual extension to general nonlinear, 
non-Gaussian settings. 
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